Telegram Group & Telegram Channel
🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents
Github

Basic possibilities

- extracting text and layout
Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers

- Local OCR
Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)

- Determining the order of reading
With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person

- Converting in Markdown
Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder

Installation and requirements
Python ≥ 3.10 (recommended 3.10.16).

Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).

For an EPUB conveier, you need access to the LLM service (for example, Deepseek).

🟡 Github


#پایتون #Python #library

🆔 @Python4all_pro



tg-me.com/Python4all_pro/1585
Create:
Last Update:

🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents
Github

Basic possibilities

- extracting text and layout
Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers

- Local OCR
Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)

- Determining the order of reading
With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person

- Converting in Markdown
Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder

Installation and requirements
Python ≥ 3.10 (recommended 3.10.16).

Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).

For an EPUB conveier, you need access to the LLM service (for example, Deepseek).

🟡 Github


#پایتون #Python #library

🆔 @Python4all_pro

BY پایتون ( Machine Learning | Data Science )


Warning: Undefined variable $i in /var/www/tg-me/post.php on line 283

Share with your friend now:
tg-me.com/Python4all_pro/1585

View MORE
Open in Telegram


پایتون Machine Learning | Data Science Telegram | DID YOU KNOW?

Date: |

Traders also expressed uncertainty about the situation with China Evergrande, as the indebted property company has not provided clarification about a key interest payment.In economic news, the Commerce Department reported an unexpected increase in U.S. new home sales in August.Crude oil prices climbed Friday and front-month WTI oil futures contracts saw gains for a fifth straight week amid tighter supplies. West Texas Intermediate Crude oil futures for November rose $0.68 or 0.9 percent at 73.98 a barrel. WTI Crude futures gained 2.8 percent for the week.

If riding a bucking bronco is your idea of fun, you’re going to love what the stock market has in store. Consider this past week’s ride a preview.The week’s action didn’t look like much, if you didn’t know better. The Dow Jones Industrial Average rose 213.12 points or 0.6%, while the S&P 500 advanced 0.5%, and the Nasdaq Composite ended little changed.

پایتون Machine Learning | Data Science from cn


Telegram پایتون ( Machine Learning | Data Science )
FROM USA